computational expressivity of transformers

The Parallelism Tradeoff:

The Parallelism Tradeoff: Understanding Transformer Expressivity Through Circuit Complexity

Computational Benefits and

Computational Benefits and Limitations of Transformers and State-Space Models

Transformers, the tech

Transformers, the tech behind LLMs | Deep Learning Chapter 5

Rethinking Attention with

Rethinking Attention with Performers (Paper Explained)

FLatten Transformer: Vision

FLatten Transformer: Vision Transformer using Focused Linear Attention

SakanaAI Unveils 'Transformer

SakanaAI Unveils 'Transformer Squared' - Test Time LEARNING

How do Vision

How do Vision Transformers work? – Paper explained | multi-head self-attention & convolutions

Cyril Zhang |

Cyril Zhang | How do Transformers reason? First principles via automata, semigroups, and circuits

GShard: Scaling Giant

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding (Paper Explained)

Kaggle Reading Group:

Kaggle Reading Group: Generating Long Sequences with Sparse Transformers | Kaggle

Gradient descent, how

Gradient descent, how neural networks learn | Deep Learning Chapter 2

Theoretical Limitations of

Theoretical Limitations of Multi layer Transformers

It Ain't Broke

It Ain't Broke So D̶o̶n̶'t̶ F̶i̶x̶ Let's Break It

Clayton Sanford: Representational

Clayton Sanford: Representational Strengths and Limitations of Transformers

Re-thinking Transformers: Searching

Re-thinking Transformers: Searching for Efficient Linear Layers over a Continuous Space of...

NEW AI Models:

NEW AI Models: Hierarchical Reasoning Models (HRM)

[ICFP'23] Modular Models

[ICFP'23] Modular Models of Monoids with Operations

Kaggle Reading Group:

Kaggle Reading Group: Generating Long Sequences with Sparse Transformers (Part 2) | Kaggle

OpenAI CLIP: ConnectingText

OpenAI CLIP: ConnectingText and Images (Paper Explained)

FNet: Mixing Tokens

FNet: Mixing Tokens with Fourier Transforms (Machine Learning Research Paper Explained)

Can Wikipedia Help

Can Wikipedia Help Offline Reinforcement Learning? (Paper Explained)

Byte Latent Transformer:

Byte Latent Transformer: Patches Scale Better Than Tokens (Paper Explained)

CARTA: Computational Neuroscience and

CARTA: Computational Neuroscience and Anthropogeny with Terry Sejnowski

'Blueprints for a

'Blueprints for a Universal Reasoning Machine' by Zenna Tavares (Strange Loop 2022)

visit shbcf.ru